A Multimodal Image Database System

نویسندگان

  • Edward Chang
  • Beitao Li
  • Wei-Cheng Lai
  • Chengwei Chang
  • Kwang-Ting Cheng
  • Michael Crandell
چکیده

We demonstrate PBIR , an integrated system that we have built for conducting multimodal image retrieval. The system combines the strengths of content-based soft annotation (CBSA), multimodal relevance feedback through active learning, and perceptual distance formulation and indexing. PBIR supports multimodal query and annotation in any combination of its three basic modes: seed-by-nothing, seedby-keywords, and seed-by-content. We demonstrate PBIR on a couple of very large image sets provided by image vendors and crawled from the Internet. 1 Overview For a search engine to perform effective searches, it has to comprehend what a user wants. Our demonstration presents a multimodal perception-based image retrieval system (PBIR ), which can capture users’ subjective queryconcepts thoroughly, and hence achieve high search accuracy. PBIR improves over PBIR (our previous demonstration [2]) by the following technologies that we have recently developed: Content-based soft annotation, Multimodal (textual and perceptual) relevance feedback through active learning, and Perceptual distance formulation and indexing. 2 Content-based Soft Annotation Content-based image retrieval supports image searches based on perceptual features, such as color, texture, and shape. However, for most users, articulating a content-based query using these low-level features can be non-intuitive and difficult. Many users prefer to using keywords to conduct searches. We believe that a keywordand content-based combined approach can benefit from the strengths of these two paradigms. A user can start a query by entering a few keywords. Once some images relevant to the query are found, the image system can use these images’ perceptual features, together with their annotation, to perform multimodal query refinement. Images must be annotated to support such keywordand content-based combined queries and refinement. In [3] we propose a content-based soft annotation (CBSA) approach to provide images each with multiple semantical labels. The input to CBSA is a training image set, each image in the set is manually annotated with one single semantical label. CBSA propagates these labels to unlabeled images as well as the labeled ones. At the end of the annotation process, each image is annotated with a label-vector, and each label in the vector a confidence factor. For instance, each image in a training set is initially labeled with one of K labels such as forest, tiger, sky, etc. Each image at the end of the CBSA process is annotated with a word vector of K labels. An image label-vector (forest : 0:1; tiger : 0:9; sky : 0:7; ) means that the image is believed to contain semantics of forest, tiger, and sky with 10%, 90%, and 70% confidence, respectively. When a text-based search is issued with keywords, images are ranked and retrieved based on their combined confidence factors on the matching labels. The content-based soft annotation algorithm consists of the following three steps: 1. Manually labeling a set of training images each with one of the pre-selected K semantical labels. 2. Training K classifiers. Based on the labeled instances, we train an ensemble of K BPM (Bayes Point Machine) binary classifiers. Each classifier is responsible for determining the confidence factor for a semantical label. 3. Annotating images using the classifiers. Each image is classified by the K classifiers. Each classifier gives each image a confidence factor on the label that the classifier is responsible for predicting. As a result, a K-nary vector consisting of K-class membership is generated for each image. 3 Multimodal Active Learning As pointed out by [1], automatic annotation may not attain extremely high accuracy at the present state of computer vision and image processing. However, providing images with some reliable semantical labels and then refining these unconfirmed labels via relevance feedback is believed an effective approach [14]. CBSA initializes images with a set of semantical words significantly better than chance. Our empirical study shows that even though the initial annotation may not be perfect, CBSA assists a user to quickly find some relevant images via a keyword search. Once some relevant images can be found, query refinement methods such as MEGA [8] and SVMActive [12] can be employed to quickly zoom into the user’s query concept. In [11], we show that the annotation quality can be improved by using user feedback collected from active learning sessions. When a user types in a keyword, say W (W can also be collected from annotated images that are marked by the user as “relevant” to his or her target concept), we select the images which are most difficult for the active learning algorithm to determine their membership to keyword W , and use those to so-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal medical image fusion based on Yager’s intuitionistic fuzzy sets

The objective of image fusion for medical images is to combine multiple images obtained from various sources into a single image suitable for better diagnosis. Most of the state-of-the-art image fusing technique is based on nonfuzzy sets, and the fused image so obtained lags with complementary information. Intuitionistic fuzzy sets (IFS) are determined to be more suitable for civilian, and medi...

متن کامل

Multimodal Biometrics for authentication using DRM Technique

Aim of this project is to implement a novel authentication scheme to establish Digital Rights Management (DRM) based on multimodal biometric verification and watermarking technique. Security of biometric system is a major concern. An attack on a biometric system can result in loss of privacy, monetary damage and security breach. Biometric system offer better security then existing approaches. T...

متن کامل

Augsburg Multimodal pLSA on Visual Features and Tags

This work studies a new approach for image retrieval on largescale community databases. Our proposed system explores two different modalities: visual features and communitygenerated metadata, such as tags. We use topic models to derive a high-level representation appropriate for retrieval for each of our images in the database. We evaluate the proposed approach experimentally in a query-by-exam...

متن کامل

Content-Based Image Retrieval by Multimodal Interaction

Due to the size of todays professional image databases, the standard approach to content-based image retrieval is to interactively navigate through the content. However, most people whose job necessitates working with such databases do not have a technical background. Commercial practice thus requires efficient retrieval techniques as well as navigation interfaces that are intuitive to use and ...

متن کامل

Image2Text: A Multimodal Caption Generator

In this work, we showcase the Image2Text system, which is a real-time captioning system that can generate human-level natural language description for any input image. We formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from...

متن کامل

A Critical Visual Analysis of Gender Representation of ELT Materials from a Multimodal Perspective

This content analysis study, employing a multimodal perspective and critical visual analysis, set out to analyze gender representations in Top Notch series, one of the highly used ELT textbooks in Iran. For this purpose, six images were selected from these series and analyzed in terms of ‘representational’, ‘interactive’ and ‘compositional’ modes of meanings. The result indicated that there are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003